City College of San Francisco
MATH 108 - Foundations of Data Science
Associated Textbook Sections: 2.0, 2.1, 2.2, 2.3. 2.4, 2.5
Image and Headline Source: everydayhealth.com
Study Source: European Journal of Preventive Cardiology
Is there an association between chocolate consumption and heart disease risk?
Yes, the reviewed article in the European Journal of Preventive Cardiology concludes that those consumed chocolate more than 1 time per week or more than 3.5 times per month were associated with fewer cases of heart disease compared with those that didn't.
Does chocolate consumption lead to a reduction in heart disease? This question is often harder to answer.
No, there are several factors that could explain why fewer people that consumed chocolate regularly developed heart disease. For example, better health care access could explain financial freedom to consume more foods like chocolate and explain less cases of heart disease.
“Dr. Alice Lichtenstein, an American Heart Association volunteer and professor of nutrition science and policy at Tufts University, was more skeptical of the findings.”
Image Source: Wikipedia - 1954 Broad Street Cholera Outbreak
This might seem strange ...
Image and Text Source: National Geographic - Mapping A London Epidemic
According to the National Geographic Society,
"This map of London was created by John Snow in 1854. London was experiencing a deadly cholera epidemic, when Snow tracked the cases on this map. The cholera cases are highlighted in black. Using this map, Snow and other scientists were able to trace the cholera outbreak to a single infected water pump."
from IPython.display import IFrame
IFrame(src="https://www.google.com/maps/embed?pb=!1m18!1m12!1m3!1d2482.9971371478814!2d-\
0.13879218398430104!3d51.51326851809472!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13\
.1!3m3!1m2!1s0x487604d4eb49ec6d%3A0xc4ff84518f83499d!2sJohn%20Snow!5e0!3m2!1\
sen!2sus!4v1642117611191!5m2!1sen!2sus",
width=800, height=600)
Image Source: British Library - John Snow's map showing the water supply in London, 1855
Image NOTE:
“… there is no difference whatever in the houses or the people receiving the supply of the two Water Companies, or in any of the physical conditions with which they are surrounded …”
The two groups were similar except for the treatment.
from datascience import *
import numpy as np
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
%matplotlib inline
snows_table = Table(['Supply Area', 'Number of Houses', 'Cholera Deaths']).with_rows([
['S&V', 40046, 1263],
['Lambeth', 26107, 98],
['Rest of London', 256423, 1422]
])
snows_table
| Supply Area | Number of Houses | Cholera Deaths |
|---|---|---|
| S&V | 40046 | 1263 |
| Lambeth | 26107 | 98 |
| Rest of London | 256423 | 1422 |
To compare the deaths totals in various supply areas, calculate the relative frequency of deaths per household.
death_per_house = snows_table.column('Cholera Deaths') / snows_table.column('Number of Houses')
snows_table.with_column('Deaths per House',
death_per_house)
| Supply Area | Number of Houses | Cholera Deaths | Deaths per House |
|---|---|---|---|
| S&V | 40046 | 1263 | 0.0315387 |
| Lambeth | 26107 | 98 | 0.00375378 |
| Rest of London | 256423 | 1422 | 0.00554552 |
Scale and round the rates to show whole numbers.
deaths_per_10000_houses = 10000 * death_per_house
snows_table.with_column('Deaths per 10,000 Houses',
np.round(deaths_per_10000_houses))
| Supply Area | Number of Houses | Cholera Deaths | Deaths per 10,000 Houses |
|---|---|---|---|
| S&V | 40046 | 1263 | 315 |
| Lambeth | 26107 | 98 | 38 |
| Rest of London | 256423 | 1422 | 55 |
Scaling rates a common presentation technique. This can provide clarity, but it can also be misleading!
If the treatment and control groups are similar apart from the treatment, then differences between the outcomes in the two groups can be ascribed to the treatment.
If the treatment and control groups have systematic differences other than the treatment, then it might be difficult to identify causality.
Such differences are often present in observational studies.
When they lead researchers astray, they are called confounding factors.
Regardless of what the dictionary says, in probability theory